Search CORE

19 research outputs found

MaSiF: Machine learning guided auto-tuning of parallel skeletons

Author: Cole Murray
Collins Alexander
Fensch Christian
Leather Hugh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2013
Field of study

An OS-Based Alternative to Full Hardware Coherence on Tiled Chip-Multiprocessors

Author: Fensch Christian
Publication venue: The University of Edinburgh
Publication date: 01/01/2008
Field of study

Institute for Computing Systems ArchitectureThe interconnect mechanisms (shared bus or crossbar) used in current chip-multiprocessors (CMPs) are expected to become a bottleneck that prevents these architectures from scaling to a larger number of cores. Tiled CMPs offer better scalability by integrating relatively simple cores with a lightweight point-to-point interconnect. However, such interconnects make snooping impractical and, thus, require alternative solutions to cache coherence. This thesis proposes a novel, cost-effective hardware mechanism to support shared-memory parallel applications that forgoes hardware maintained cache coherence. The proposed mech- anism is based on the key ideas that mapping of lines to physical caches is done at the page level with OS support and that hardware supports remote cache accesses. It allows only some controlled migration and replication of data and provides a sufficient degree of flexibility in the mapping through an extra level of indirection between virtual pages and physical tiles. The proposed tiled CMP architecture is evaluated on the SPLASH-2 scientific benchmarks and ALPBench multimedia benchmarks against one with private caches and a distributed direc- tory cache coherence mechanism. Experimental results show that the performance degradation is as little as 0%, and 16% on average, compared to the cache coherent architecture across all benchmarks for 16 and 32 processors

Edinburgh Research Archive

Patterns and Rewrite Rules for Systematic Code Generation (From High-Level Functional Patterns to High-Performance OpenCL Code)

Author: Dubach Christophe
Fensch Christian
Steuwer Michel
Publication venue
Publication date: 09/02/2015
Field of study

Computing systems have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort. This results in a tension between achieving performance and code portability. Code is either tuned using device-specific optimizations to achieve maximum performance or is written in a high-level language to achieve portability at the expense of performance. We propose a novel approach that offers high-level programming, code portability and high-performance. It is based on algorithmic pattern composition coupled with a powerful, yet simple, set of rewrite rules. This enables systematic transformation and optimization of a high-level program into a low-level hardware specific representation which leads to high performance code. We test our design in practice by describing a subset of the OpenCL programming model with low-level patterns and by implementing a compiler which generates high performance OpenCL code. Our experiments show that we can systematically derive high-performance device-specific implementations from simple high-level algorithmic expressions. The performance of the generated OpenCL code is on par with highly tuned implementations for multicore CPUs and GPUs written by expertsComment: Technical Repor

arXiv.org e-Print Archive

CiteSeerX

Edinburgh Research Explorer

MaxPair: Enhance OpenCL Concurrent Kernel Execution by Weighted Maximum Matching

Author: Fensch Christian
O'Boyle Michael F.P.
Wen Yuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/02/2018
Field of study

Heriot Watt Pure

Edinburgh Research Explorer

Device-Hopping: Transparent Mid-Kernel Runtime Switching for Heterogeneous Systems

Author: Cole Murray
Fensch Christian
Metzger Paul
Seeker Volker
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/09/2021
Field of study

Edinburgh Research Explorer

Enforcing Deadlines for Skeleton-based Parallel Programming

Author: Aldinucci Marco
Bini Enrico
Cole Murray
Fensch Christian
Metzger Paul
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study